Compiler Introduction
+ Compiler Definition
A compiler is a program that reads a program written in one programming language (the source language) and translates it into an equivalent program in another programming language (the target language).
- Input: source program
- Source Code Analysis (front end)
- Intermediate Code Generator 中间代码发生器
- Synthesis (back end) Output: target program
Analysis Front end
Analysis comes in three phases:
- Lexical analysis
- Syntax analysis
- Semantic analysis
[[02-Compiler Lexical Analysis|Lexical Analysis]] 词法分析
The program does lexical analysis is called lexer or scanner.
- Reads the input program character by character as a stream.
- Splits the stream into lexemes, means “lexical elements”.
- Classifies each lexeme into a category of lexemes, called token.
We use regular expressions to do lexical analysis.
Example
num=1+23;
No. | Lexeme | Token |
---|---|---|
1 | num | identifier |
2 | = | ass_opt |
3 | 1 | int_literal |
4 | + | plus_opt |
5 | 23 | int_literal |
6 | ; | semicolon |
Tokens
Here is a list of common tokens:
- identifier: names the programmer chooses, like variable names; 变量名
- keywords: names already in the programming language, like
if
;关键字 - separator: punctuation characters and paired-delimiters, like
{}
; - operator: symbols operate on arguments and produce results, like
+
; - literal: numeric, logical, textual, reference literals, like
true
,123
;字面量 - comment
[[03-Compiler Syntax Analysis|Syntax Analysis]] 句法分析
The program which does syntax analysis is called a parser.
- Uses a grammar to analyze the form of tokens.
- And groups the tokens into a nested hierarchical structure.
- The structure is called a parse tree.
- The parse tree shows the structure of the program.
- Internal vertices are called non-terminals. Leaves are called terminals.
Same as using English grammar to check if the nouns, verbs, adjectives, etc., in a sentence are in the correct order.
Sometimes the parse tree can be extremely large.
- To simplify the structure, we can remove the internal vertices and let the operators be the parents.
- This is called syntax tree.
[[04-Compiler Semantic Analysis|Semantic Analysis ]]语义分析
- It takes a parse tree as input.
- Checks the “meaning” of the program.
- Searching errors in the program.
- For example,
- undefined variables,
- uninitialized variables,
- multiple declaration,
- types of variables for operations and functions, etc.
Semantic Analysis also does type conversions for many programming languages.
float x=3.5;
int y=x;
the compiler adds a type conversion on